Introduction

The purpose of this supplementary file is to reproduce data analysis of artefact compositions and isotopic signals of objects from the Duchcov hoard from the article Exceptional ritual or a production deposit? The Duchcov hoard (Bohemia) as a proxy to ´Celtic migrations´ in the 4th century BC.

R version 3.6.3 was used (R Core Team, 2020) to run the analysis. Following R packages are used: tidyverse family of packages (Wickham et al., 2019), ggforce (Pedersen, 2019), ggbiplot (Vu, 2011), ggridges (Wilke, 2020), cluster (Maechler et al., 2019), pdist (Wong, 2013), StatMatch (D’Orazio, 2019), MASS (Venables and Ripley, 2002), cowplot (Wilke, 2019), corrplot (Wei and Simko, 2017), knitr (Xie, 2020a), DT (Xie et al., 2020), bookdown (Xie, 2020b), grid (R Core Team, 2020), gridExtra (Auguie, 2017).

Composition analysis

Only Co, Ni, Zn, As, Ag, Sb, Pb element compositions of 49 analyzed artefacts from Duchcov hoard were used for the analyses.

## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6     PC7
## Standard deviation     1.7159 1.1722 0.9935 0.9010 0.66765 0.51310 0.41717
## Proportion of Variance 0.4206 0.1963 0.1410 0.1160 0.06368 0.03761 0.02486
## Cumulative Proportion  0.4206 0.6169 0.7579 0.8739 0.93753 0.97514 1.00000

Grouping in the compositions

Elbow method and silhouette method are used to determine number of clusters in the dataset.

Scree plot

Figure 1: Scree plot

Silhouettes

Figure 2: Silhouettes

Both methods suggest optimal number of clusters in the dataset is 4 to 5, we chose to use k-means clustering with 4 clusters.

Mean values for different element compositions in the clusters 1 to 4:

Table 1: Mean values for element compositions
kmeans Co Ni Zn As Ag Sb Pb
1 184.7500 0.1987 0.0733 0.2133 0.0295 0.1443 2.3779
2 38.9677 0.0377 0.0548 0.0939 0.0112 0.0412 0.2274
3 0.0000 0.0263 1.8100 0.0567 0.0057 0.0370 0.1713
4 971.0000 0.0710 0.1900 0.5400 0.0303 0.0623 0.1907

Lead isotopes

Ore sources

Here we compare the distributions of Duchcov k-means clusters with distributions of known sources based on lead isotopes.

Points removed as outliers are highlighted in red in the following figures. Outliers are defined as points lying further than 2 interquartile ranges from the upper quartile of the Mahalanobis distances from the data distribution center in a multidimensional space defined by isotopic signals \(^{206}Pb/^{204}Pb\), \(^{207}Pb/^{204}Pb\), \(^{208}Pb/^{204}Pb\), \(^{207}Pb/^{206}Pb\) and \(^{208}Pb/^{206}Pb\).

Ore sources outliers

Figure 3: Ore sources outliers

Ore sources outliers

Figure 4: Ore sources outliers

Ore sources outliers

Figure 5: Ore sources outliers

Overall view of distributions of lead isotope values.

Ore sources overlayed with Duchcov hoard data

Figure 6: Ore sources overlayed with Duchcov hoard data

Ore sources overlayed with Duchcov hoard data

Figure 7: Ore sources overlayed with Duchcov hoard data

Ore sources overlayed with Duchcov hoard data

Figure 8: Ore sources overlayed with Duchcov hoard data

Several visual methods are explored in order to determine whether the lead isotopic signal from Duchcov hoard objects grouped by k-means clustering correspond to known sources. At first, areas corresponding to lead isotope signals from different geographic regions are constructed in isospaces using concave hulls (Gombin et al., 2017, concaveman algorithm) of the point distributions for given geographic regions.

Kernel density estimation for each of the source regions helps in identifying to what extent the point distribution of Duchcov hoard objects fits the density distribution in given regions. kde2d algorithm of MASS package is used (Venables and Ripley, 2002).

Kernel density estimation of ore sources

Figure 9: Kernel density estimation of ore sources

Kernel density estimation of ore sources

Figure 10: Kernel density estimation of ore sources

Kernel density estimation of ore sources

Figure 11: Kernel density estimation of ore sources

Euclidean distances

Euclidean distance between all the points in given geographic region and k-means clusters from Duchcov hoard is used as one of the proxies to identify possible sources of the lead isotope ratios \(^{208}Pb/^{204}Pb\), \(^{207}Pb/^{204}Pb\), \(^{206}Pb/^{204}Pb\), \(^{208}Pb/^{206}Pb\) and \(^{207}Pb/^{206}Pb\).

The problem with using Euclidean distances is that the multivariate shape of the data distribution is not taken into account, thus Mahalanobis distance (see further) seems as a better fit for the task, even though Euclidean distances are commonly used (e.g. Ling et al., 2014).

Discriminant analysis

Manova is used in order to determine whether there are significant differences between group means for the distinct regions. Most of the group means are significantly different in most of the dimensions defined by different lead isotopes. Region means that are not sig. different are Slovakia, Easter Carpathians and South-Eastern Alpes (see results below). Pillai test on MANOVA is significant, thus the region means combined for all the variables (lead isotopic values) are significantly different.

Only regions with median Mahalanobis distance smaller than 5 are included in LDA with exception of Slovakia. Slovakia has leser Mahalanobis distance, but is removed from the test, because according to MANOVA, it is the only one group that is not significantly different.

## Response Pb207_206 :
## 
## Call:
## lm(formula = Pb207_206 ~ sources_subset$Region)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.083573 -0.004612 -0.000499  0.006032  0.051416 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         0.838894   0.001731 484.675  < 2e-16 ***
## sources_subset$RegionSaarland       0.012405   0.002171   5.715 1.84e-08 ***
## sources_subset$RegionE Alps        -0.027991   0.002152 -13.005  < 2e-16 ***
## sources_subset$RegionSE Alps        0.021184   0.002178   9.724  < 2e-16 ***
## sources_subset$RegionErzgebirge     0.013404   0.002222   6.032 3.04e-09 ***
## sources_subset$RegionE Carpathians -0.005909   0.002320  -2.547   0.0112 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01329 on 530 degrees of freedom
## Multiple R-squared:  0.6338, Adjusted R-squared:  0.6304 
## F-statistic: 183.5 on 5 and 530 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb208_206 :
## 
## Call:
## lm(formula = Pb208_206 ~ sources_subset$Region)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156006 -0.007665  0.000987  0.010358  0.099886 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         2.070344   0.003227 641.553  < 2e-16 ***
## sources_subset$RegionSaarland       0.018247   0.004047   4.509 8.04e-06 ***
## sources_subset$RegionE Alps        -0.031838   0.004013  -7.934 1.27e-14 ***
## sources_subset$RegionSE Alps        0.038589   0.004062   9.501  < 2e-16 ***
## sources_subset$RegionErzgebirge     0.022366   0.004143   5.398 1.02e-07 ***
## sources_subset$RegionE Carpathians -0.008377   0.004326  -1.936   0.0534 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02479 on 530 degrees of freedom
## Multiple R-squared:  0.4945, Adjusted R-squared:  0.4898 
## F-statistic: 103.7 on 5 and 530 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb206_204 :
## 
## Call:
## lm(formula = Pb206_204 ~ sources_subset$Region)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2068 -0.1429  0.0097  0.1046  2.3620 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        18.68180    0.04445 420.281  < 2e-16 ***
## sources_subset$RegionSaarland      -0.33292    0.05575  -5.972 4.30e-09 ***
## sources_subset$RegionE Alps         0.72722    0.05527  13.157  < 2e-16 ***
## sources_subset$RegionSE Alps       -0.46861    0.05595  -8.376 4.93e-16 ***
## sources_subset$RegionErzgebirge    -0.37794    0.05707  -6.622 8.68e-11 ***
## sources_subset$RegionE Carpathians  0.11216    0.05959   1.882   0.0604 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3414 on 530 degrees of freedom
## Multiple R-squared:  0.6252, Adjusted R-squared:  0.6216 
## F-statistic: 176.8 on 5 and 530 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb207_204 :
## 
## Call:
## lm(formula = Pb207_204 ~ sources_subset$Region)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.130548 -0.012688  0.000218  0.011287  0.116907 
## 
## Coefficients:
##                                     Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                        15.667000   0.003148 4976.260   <2e-16 ***
## sources_subset$RegionSaarland      -0.046937   0.003948  -11.888   <2e-16 ***
## sources_subset$RegionE Alps         0.055741   0.003915   14.238   <2e-16 ***
## sources_subset$RegionSE Alps       -0.004218   0.003963   -1.064   0.2876    
## sources_subset$RegionErzgebirge    -0.067402   0.004042  -16.675   <2e-16 ***
## sources_subset$RegionE Carpathians -0.012541   0.004221   -2.971   0.0031 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02418 on 530 degrees of freedom
## Multiple R-squared:  0.747,  Adjusted R-squared:  0.7446 
## F-statistic:   313 on 5 and 530 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb208_204 :
## 
## Call:
## lm(formula = Pb208_204 ~ sources_subset$Region)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1535 -0.1453  0.0301  0.1258  1.8033 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        38.66768    0.04576 845.053  < 2e-16 ***
## sources_subset$RegionSaarland      -0.34485    0.05739  -6.009 3.47e-09 ***
## sources_subset$RegionE Alps         0.87200    0.05690  15.325  < 2e-16 ***
## sources_subset$RegionSE Alps       -0.26015    0.05759  -4.517 7.73e-06 ***
## sources_subset$RegionErzgebirge    -0.36390    0.05875  -6.194 1.18e-09 ***
## sources_subset$RegionE Carpathians  0.08348    0.06134   1.361    0.174    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3515 on 530 degrees of freedom
## Multiple R-squared:  0.6346, Adjusted R-squared:  0.6312 
## F-statistic: 184.1 on 5 and 530 DF,  p-value: < 2.2e-16
##                        Df Pillai approx F num Df den Df    Pr(>F)    
## sources_subset$Region   5 1.6176   50.692     25   2650 < 2.2e-16 ***
## Residuals             530                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear discriminant analysis is then used to separate the regions.

## Call:
## lda(Region ~ ., data = sources_subset[, c("Region", "Pb207_206", 
##     "Pb208_206", "Pb206_204", "Pb207_204", "Pb208_204")], prior = as.double(paste(rep(1/n_levels_sources, 
##     n_levels_sources), sep = ",")))
## 
## Prior probabilities of groups:
##        Valais      Saarland        E Alps       SE Alps    Erzgebirge 
##     0.1666667     0.1666667     0.1666667     0.1666667     0.1666667 
## E Carpathians 
##     0.1666667 
## 
## Group means:
##               Pb207_206 Pb208_206 Pb206_204 Pb207_204 Pb208_204
## Valais        0.8388941  2.070344  18.68180  15.66700  38.66768
## Saarland      0.8512989  2.088591  18.34888  15.62006  38.32283
## E Alps        0.8109029  2.038506  19.40902  15.72274  39.53968
## SE Alps       0.8600777  2.108933  18.21319  15.66278  38.40752
## Erzgebirge    0.8522983  2.092711  18.30386  15.59960  38.30378
## E Carpathians 0.8329849  2.061967  18.79396  15.65446  38.75116
## 
## Coefficients of linear discriminants:
##                  LD1        LD2           LD3        LD4         LD5
## Pb207_206  42.302498 1133.49216 -1647.7969713  5038.1390 -1032.98177
## Pb208_206  73.962760 -315.07897   642.9876399 -3150.1984   921.79432
## Pb206_204  11.030219   10.84895    -0.8771077  -118.8825    57.04978
## Pb207_204 -56.005454  -17.89743   102.1053150  -245.4064    54.79130
## Pb208_204  -3.933951   15.60876   -36.4779825   161.5616   -50.66278
## 
## Proportion of trace:
##    LD1    LD2    LD3    LD4    LD5 
## 0.5872 0.3696 0.0343 0.0085 0.0004

The LDA works, but the there are problems in classification of points, meaning it is not accurate. (LDA is limited to the data distributions of regions that are the most proximal to the points of interest according to Mahalanobis distance). Using the LDA to predict the regions, we get a lot of error, see the confusion matrix below. How to read it: rows are the regions the point is from and columns are regions predicted based on the LDA.

Table 2: Confusion matrix
Valais Saarland E Alps SE Alps Erzgebirge E Carpathians Sum
Valais 37 3 3 3 1 12 59
Saarland 3 82 0 2 15 1 103
E Alps 21 0 81 2 0 4 108
SE Alps 9 1 0 91 0 0 101
Erzgebirge 0 16 0 1 72 2 91
E Carpathians 20 1 0 0 6 47 74
Sum 90 103 84 99 94 66 536
## [1] 0.76
##                
##                 Valais Saarland E Alps SE Alps Erzgebirge E Carpathians
##   Valais          0.63     0.05   0.05    0.05       0.02          0.20
##   Saarland        0.03     0.80   0.00    0.02       0.15          0.01
##   E Alps          0.19     0.00   0.75    0.02       0.00          0.04
##   SE Alps         0.09     0.01   0.00    0.90       0.00          0.00
##   Erzgebirge      0.00     0.18   0.00    0.01       0.79          0.02
##   E Carpathians   0.27     0.01   0.00    0.00       0.08          0.64

Prediction of Regions based on data on points from Duchcov hoard.

Table 3: Counts of points per different sources
Valais Saarland SE Alps Erzgebirge E Carpathians Sum
1 8 0 1 0 2 11
2 7 13 0 8 3 31
3 0 1 0 2 0 3
4 0 1 0 1 1 3
Sum 15 15 1 11 6 48
Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Figure 12: Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Figure 13: Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Figure 14: Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Burial grounds

References

Auguie, B., 2017. GridExtra: Miscellaneous functions for "grid" graphics.

D’Orazio, M., 2019. StatMatch: Statistical matching or data fusion.

Gombin, J., Vaidyanathan, R., Agafonkin, V., 2017. Concaveman: A very fast 2D concave hull algorithm.

Ling, J., Stos-Gale, Z., Grandin, L., Billström, K., Hjärthner-Holdar, E., Persson, P.-O., 2014. Moving metals ii: Provenancing scandinavian bronze age artefacts by lead isotope and elemental analyses. Journal of Archaeological Science 41, 106–132. https://doi.org/https://doi.org/10.1016/j.jas.2013.07.018

Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K., 2019. Cluster: Cluster analysis basics and extensions.

Pedersen, T.L., 2019. Ggforce: Accelerating ’ggplot2’.

R Core Team, 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Venables, W.N., Ripley, B.D., 2002. Modern applied statistics with s, Fourth. ed. Springer, New York.

Vu, V.Q., 2011. Ggbiplot: A ggplot2 based biplot.

Wei, T., Simko, V., 2017. R package "corrplot": Visualization of a correlation matrix.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T.L., Miller, E., Bache, S.M., Müller, K., Ooms, J., Robinson, D., Seidel, D.P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., Yutani, H., 2019. Welcome to the tidyverse. Journal of Open Source Software 4, 1686. https://doi.org/10.21105/joss.01686

Wilke, C.O., 2020. Ggridges: Ridgeline plots in ’ggplot2’.

Wilke, C.O., 2019. Cowplot: Streamlined plot theme and plot annotations for ’ggplot2’.

Wong, J., 2013. Pdist: Partitioned distance function.

Xie, Y., 2020a. Knitr: A general-purpose package for dynamic report generation in r.

Xie, Y., 2020b. Bookdown: Authoring books and technical documents with r markdown.

Xie, Y., Cheng, J., Tan, X., 2020. DT: A wrapper of the javascript library ’datatables’.